A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation

نویسندگان

Frank D. Wood

Yee Whye Teh

چکیده

In this paper we present a doubly hierarchical Pitman-Yor process language model. Its bottom layer of hierarchy consists of multiple hierarchical Pitman-Yor process language models, one each for some number of domains. The novel top layer of hierarchy consists of a mechanism to couple together multiple language models such that they share statistical strength. Intuitively this sharing results in the “adaptation” of a latent shared language model to each domain. We introduce a general formalism capable of describing the overall model which we call the graphical Pitman-Yor process and explain how to perform Bayesian inference in it. We present encouraging language model domain adaptation results that both illustrate the potential benefits of our new model and suggest new avenues of inquiry.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hierarchical Bayesian Approach for Semi-supervised Discriminative Language Modeling

Discriminative language modeling provides a mechanism for differentiating between competing word hypotheses, which are usually ignored in traditional maximum likelihood estimation of N-gram language models. Discriminative language modeling usually requires manual transcription which can be costly and slow to obtain. On the other hand, there are vast amount of untranscribed speech data on which ...

متن کامل

Introducing of Dirichlet process prior in the Nonparametric Bayesian models frame work

Statistical models are utilized to learn about the mechanism that the data are generating from it. Often it is assumed that the random variables y_i,i=1,…,n ,are samples from the probability distribution F which is belong to a parametric distributions class. However, in practice, a parametric model may be inappropriate to describe the data. In this settings, the parametric assumption could be r...

متن کامل

Hierarchical statistical language models: experiments on in-domain adaptation

We introduce a hierarchical statistical language model, represented as a collection of local models plus a general sentence model. We provide an example that mixes a trigram general model and a PFSA local model for the class of decimal numbers, described in terms of sub-word units (graphemes). This model practically extends the vocabulary of the overall model to an infinite size, but still has ...

متن کامل

Nonparametric Bayesian Data Analysis

We review the current state of nonparametric Bayesian inference. The discussion follows a list of important statistical inference problems, including density estimation, regression, survival analysis, hierarchical models and model validation. For each inference problem we review relevant nonparametric Bayesian models and approaches including Dirichlet process (DP) models and variations, Polya t...

متن کامل

Implementation of the sequence memoizer in a probabilistic programming language

The sequence memoizer (SM) is an advanced probabilistic model for discrete sequences ([13], [14]). Normally the sparsity of the training data results in overconfident estimates of observed sequences and underestimates of those deviating from but similar to the training data. Ad-hoc methods, such as Kneser–Ney smoothing, have been developed to overcome these limitations— however, by using a hier...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation

نویسندگان

چکیده

منابع مشابه

A Hierarchical Bayesian Approach for Semi-supervised Discriminative Language Modeling

Introducing of Dirichlet process prior in the Nonparametric Bayesian models frame work

Hierarchical statistical language models: experiments on in-domain adaptation

Nonparametric Bayesian Data Analysis

Implementation of the sequence memoizer in a probabilistic programming language

عنوان ژورنال:

اشتراک گذاری